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Value-at-Risk (VaR) and Conditional- Value-at-Risk (CVaR) are two risk measures which are 
widely used in the practice of risk management. This paper deals with the problem of estimating 
' both VaR and CVaR using stochastic approximation (with decreasing steps): we propose a 

first Robbins-Monro (RM) procedure based on Rockafellar-Uryasev's identity for the CVaR. 
Pi' Convergence rate of this algorithm to its target satisfies a Gaussian Central Limit Theorem. As 

, a second step, in order to speed up the initial procedure, we propose a recursive and adaptive 

importance sampling (IS) procedure which induces a significant variance reduction of both VaR 
and CVaR procedures. This idea, which has been investigated by many authors, follows a new 
approach introduced in |27| . Finally, to speed up the initialization phase of the IS algorithm, 
Q^l we replace the original confidence level of the VaR by a slowly moving risk level. We prove that 

the weak convergence rate of the resulting procedure is ruled by a Central Limit Theorem with 
, minimal variance and its efficiency is illustrated on several typical energy portfolios. 

> • 

Keywords: VaR, CVaR, Stochastic Approximation, Robbins-Monro algorithm. Importance 
0^ ' Samphng, Girsanov. 
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CN ■ 1 Introduction 
00 

I Following financial institutions, energy companies are developing a risk management framework 
to face the new price and volatility risks associated to the growth of energy markets. Value- 
at-Risk (VaR) and Conditional Value-at-Risk (CVaR) are certainly the best known and the most 
^ I common risk measures used in this context, especially for the evaluation of extreme losses potentially 
' faced by traders. Naturally related to rare events, the estimation of these risk measures is a 
numerical challenge. The Monte Carlo method, which is often the only available numerical device 
in such a general framework, must always be associated to efficient reduction variances techniques 
to encompass its slow convergence rate. In some specific cases, Gaussian approximations can lead to 
semi-closed form estimators. But, if these approximations can be of some interest when considering 
the yield of a portfolio, they turn out to be useless when estimating e.g. the VaR on the EBITDA 
(Earnings Before Interest, Taxes, Depreciation, and Amortization) of a huge portfolio as it is often 
the case in the energy sector. 

In this article, we introduce an alternative estimation method to estmate both VaR and CVaR, 
relying on the use of recursive stochastic algorithms. By definition, the VaR at level a £ (0, 1) 
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(VaRa) of a given portfolio is the lowest amount not exceeded by the loss with probability a (usually 
a > 95%). The Conditional Value-at-Risk at level a (CVaRc) is the conditional expectation of 
the portfolio losses beyond the VaRc level. Compared to VaR, CVaR is known to have better 
properties. It is a coherent risk measure in the sense of Artzner, Delbaen, Eber and Heath, see [2]. 

The most commonly used method to compute VaR is the inversion of the simulated empirical 
loss distribution function using Monte Carlo or historical simulation tools. The historical simula- 
tion method usually assumes that the asset returns in the future are independent and identically 
distributed, having the same distribution as they had in the past. Over a time interval [t,T], the 
loss is defined by L := V{St,t) — V{St + AS, T), where St denotes the market price vector observed 
at time t, AS = St — St the variation of S over the time interval [t, T] -which can be calculated 
using historical data- and V{St,t) the portfolio value at time t. The distribution of this loss L can 
be computed with the corresponding VaR at a given probability level by the inversion of the empir- 
ical function method. However, when the market price dynamics follow a general diffusion process 
solution of a stochastic differential equation (SDE), the assumption of asset returns independence 
is no longer available. 

To circumvent this problem, Monte Carlo simulation tools are generally used. Another widely 
used method relies on a linear (Normal approximation) or quadratic expansion (Delta-Gamma ap- 
proximation) and assume a joint normal (or log-normal) distribution for A^*. The Normal approxi- 
mation method gives L a normal distribution, thus the computation of the VaRo, is straightforward. 
However, when there is a non-linear dependence between the portfolio value and the prices of the 
underlying assets (think of a portfolio with options) such approximation is no longer acceptable. 
The Delta-Gamma approximation tries to capture some non linearity by adding a quadratic term 
in the loss expansion. Then, it is possible to find the distribution of the resulting approximation in 
order to obtain an approximation of the VaR. For more details about these methods, we refer to [7|, 
[8], [15], [16] and [3l]. Such approximations are no longer acceptable when considering portfolios 
with long maturity (T — t = 1 year up to 10 years) or when the loss is a functional of a general 
path-dependent SDE. 

In the context of hedging or optimizing a portfolio of financial instruments to reduce the CVaR, 
it is shown in [33] that it is possible to compute both VaR and CVaR (actually calculate VaR and 
optimize CVaR) by solving a convex optimization problem with a linear programming approach. It 
consists in generating loss scenarios and then in introducing constraints in the linear programming 
problem. Although they address a different problem, this method can be used to compute both 
VaR and CVaR. The advantage of such a method is that it is possible to estimate both VaR and 
CVaR simultaneously without assuming that the market prices have a specified distribution (e.g. 
normal, log-normal, ...). The main drawback is that the dimension (number of constraints) of the 
linear programming problem to be solved is equal to the number of simulated scenarios. In our 
approach, we are no limited by the number of generated sample paths used in the procedure. 

The idea to compute both VaR and CVaR with one procedure comes from the fact that they 
are strongly linked as they appear as the solutions and the value of the same convex optimisation 
problem (see Proposition 2.1) as pointed out |33j . Moreover both the objective function of the 
minimization problem and its gradient read as an expectation. This leads us to define consistent 
and asymptotically normal estimators of both quantities as the limit of a global Robbins-Monro 
(RM) procedure. Consequently, we are no longer constrained by the number of samples paths used 
in the estimation. 

A significant advantage of this recursive approach, especially in regard to the inversion of the 
empirical function method is that we only estimate the quantities of interest and not the whole 
inverse of the distribution function. Furthermore, we do not need to make approximations of the 
loss or of the convex optimization problem to be solved. Moreover, the implementation of the 
algorithm is straightforward. However to make it really efficient we need to modify it owing to 
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the fact that VaR and CVaR computation is closely related to the simulation of rare events. That 
is why as a necessary improvement, we introduce a (recursive and adaptive) variance reduction 
method based on an importance sampling (IS) paradigm. 

Let us be a bit more specific. Basically in this kind of problem we are interested in events 
that are observed with a very low probability (usually less that 5%, 1% or even 0.1%) so that we 
obtain few significant replications to update our estimates. Actually, interesting losses are those 
that exceed the VaR, i.e. the ones that are "in the tail" of the loss distribution. Thus in order to 
compute more accurate estimates of both quantities of interest, it is necessary to generate more 
samples in the tail of L, the area of interest. A general tool used in this situation is IS. 

The basic principle of IS is to modify the distribution of L by an equivalent change of measure 
to obtain more "interesting" samples that will lead to better estimates of the VaR and CVaR. The 
main issue of IS is to find a right change of measure (among a parameterized family) that will 
induce a significant variance reduction. In [16] and [T7], a change of measure based on a large 
deviation upper bound is proposed to estimate the loss probability P(L > x) for several values of x. 
Then, it is possible to estimate the VaR by interpolating between the estimated loss probabilities. 

Although this approach provides an asymptotically optimal IS distribution, it is strongly based 
on the fact that the Delta-Gamma approximation holds exactly and relies on the assumption that, 
conditionally to the past data, market moves are normally distributed. Moreover, as shown in 
[18j . importance sampling estimators based on a large deviations change of measure can have 
variance that increases with the rarity of the event, and even infinite variance. In |12j . the VaR^ is 
estimated by using a quantile estimator based on the inversion of the empirical weighted function 
and combined with Robbins-Monro (RM) algorithm with repeated projection devised to produce 
the optimal measure change for IS purpose. This kind of IS algorithm is known to converge toward 
the optimal importance sampling parameter only after a (long) stabilization phase and provided 
that the compact sets have been appropriately specified. By contrast, our parameters are optimized 
by an adaptive unconstrained (i.e. without projections) RM algorithm naturally combined with our 
VaR-CVaR procedure. 

One major issue that arises when combining the VaR-CVaR algorithm with the recursive IS 
procedure is to ensure that the IS parameters do move appropriately toward the critical risk area. 
They may remain stuck at the very beginning of the IS procedure. To circumvent this problem, we 
make the confidence level slowly increase from a low level (say 50%) to a by introducing a deter- 
ministic sequence {an)n>o of confidence level that converges toward a. This kind of incremental 
threshold increase has been proposed previously [22] in a different framework (use of cross entropy 
in rare event simulation) . It speeds up the initialization phase of the IS algorithm and consequently 
improves the variance reduction. Thus, we can truly experiment asymptotic convergence results in 
practice. 

The paper is organized as follows. In the next section, we present some theoretical results about 
VaR and CVaR. We introduce the VaR-CVaR stochastic algorithm in its first and naive version 
and study its convergence rate. We also introduce some background about IS using stochastic 
approximation algorithm. Section 3 is devoted to the design of an optimal procedure using an 
adaptive variance reduction procedure. We present how it modifies the asymptotic variance of our 
first CLT. In Section 4 we provide some extensions to the exponential change of measure and to 
deal with the case of infinite dimensional setting. Section 5 is dedicated to numerical examples. 
We propose several portfolios of options on several assets in order to challenge the algorithm and 
display variance reduction factors obtained using the IS procedure. To prevent the freezing of the 
algorithm during the first iterations of the IS procedure, we also consider a deterministic moving 
risk level an which replace a to speed up the initialization phase and improve the reduction of 
variance. We prove theoretically that modifying in this way the algorithm doesn't change the 
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previous CLT and fasten the convergence. 

Notations: • |.| will denote the canonical Euclidean norm on R*^ and (., .) will denote the canonical 
inner product. 

• — ^ will denote the convergence in distribution and will denote the almost sure convergence. 

• := max(0,x) will denote the positive part function. 

2 VaR, CVaR using stochastic approximation and some back- 
ground on recursive IS 

It is rather natural to consider that the loss of the portfolio over the considered time horizon can 
be written as a function of a structural finite dimensional random vector, i.e. L = (p{X), where X 
is a M -valued random vector defined on the probability space {Q, A, P) and : M"* ^ M is a Borel 
function, ip is the function representing the composition of the portfolio which remains fixed and 
X is a structural random vector used to model the market prices over the time interval; therefore 
we do not need to specify the dynamics of the market prices and only rely on the fact that it is 
possible to sample from the distribution of X. For instance, in a Black-Scholes framework, X is 
a Gaussian vector and f can be a portfolio of vanilla options. In more sophisticated models or 
portfolio, X can be a vector of Brownian increments related to the Euler scheme of a diffusion. 
The VaR at level a G (0, 1) is the lowest a-quantile of the distribution f{X) i.e.: 

VaR„(99(X)) := inf | P((^(X) < > «} • 

Since lim^^^^F {lp{X) < ^) = 1, we have | ¥ {ip{X) < ^) > a} ^ 0. Moreover, we have 
lim^^_oo P (fix) < £,) = 0, which implies that | P {ip{X) < S,) > a} is bounded from below 
so that the VaR always exists. We assume that the distribution function of f{X) is continuous 
(i.e. without atoms) so that the VaR is the lowest solution of the equation: 

¥ {ip{X) < = a. 

Three values of a are commonly considered: 0.95, 0.99, 0.995 so that it is usually close to 1 and 
the tail of interest has probability 1 — a. If the distribution function is (strictly) increasing, the 
solution of the above equation is unique, otherwise, there may be more than one solution. In fact, 
in what follows, we will consider that any solution of the previous equation is the VaR. Another 
risk measure generally used to provide information about the tail of the distribution of f{X) is the 
Conditional Value-at-Risk (CVaR) (at level a). As soon as ip{X) £ Li(P), it is defined by: 

CVaR„(99(X)) := E [ip{X)\ip{X) > VaRMX))] . 

The CVaR of ^(X) is simply the conditional expectation of ^(X) given that it lies inside the critical 
risk area. To capture more information on the conditional distribution of ^{X), it seems natural 
to consider more general risk measures like for example the conditional variance. In a more general 
framework we can be interested in estimating the ^-Conditional Value at Risk (^f-CVaR) (at level 
a) where ^' : M — > R is a continuous function. As soon as ^'((/^(A')) G L^(P), it is defined by: 

^-CYbRMX)) := E [^{ip{X))\^{X) > VaRMX))] . (1) 

When ^ = Id and 99(A) G L^{¥), ^ is the regular CVaR of ip{X). When ^ = x x^, equation ^ 
is but the conditional quadratic norm of f{X). 
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2.1 Representation of VaR and "iZ-CVaR as expectations 

The idea to devise a stochastic approximation procedure to compute VaR and CVaR, and more 
generally the ^-CVaR, comes from the fact that these two quantities are solutions of a convex 
optimization problem whose value function can be represented as an expectation as pointed out by 
Rockafellar and Uryasev in |32j . 

Proposition 2.1. Let V and V^sj be the functions defined by: 

V{0=^H^,X)] and V^{0=^M^,X)] (2) 

where 

v{C,x):=^+-^{^{X)-^U ^nd w{^,x):=^iO + -^i^{vix))-^iO)lM.)>^}. (3) 
1 — a 1 — a ' ' 

Suppose that the distribution function of ip{X) is continuous and that ^{X) G L^{F). Then, the 
function V is convex, differentiable and the VaRa((/9(X)) is any point of the set: 

argminy = E M | V'{0 = O} = U I ^{^{X) <0= «}> 
where V' is the derivative of V defined for every G M 6y 

dv 



= E 

Furthermore, 



(4) 



CYaRJ(p(X)) = miny(0 
and, if is continuous and that "iz {ip{X)) G -^^"'^(IP), for every ^* G argminF (i.e., ^* is a 

^-CVaR,(v?(X)) = V^iC). 

Proof. Since the functions ^ {f{x) — ^)+, x G M*^, are convex, the function V is convex. ¥{du))- 
a.s., exists at every ^ G M and 



¥{dw)-a.s., 



dv 



< 1 V 



1 — a 



Thanks to Lebesgue Dominated Convergence Theorem, one can interchange differentiation and 
expectation, so that V is differentiable with derivative l^'(C) = 1 — j^¥{ip{X) > ^) and reaches 
its absolute minimum at any satisfying ¥{{p{X) > (^*) = 1 — a i.e. ¥{{p{X) < ^*) = a. 
Moreover, it is clear that: 

Via = + ^t^'^^^) - ™ 



F{^{x) > e) 
cnux)H^]+nMx)-n+] 
F{^{x) > e) 

= E[^iX)\^iX) > C] 

and, in the same way, = ^'-CVaRa((/9(X)). This completes the proof. □ 
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Remark: Actually, one could consider a more general framework by including any risk measure 
defined by an integral representation with respect to X: 

E[A{e,X)] 

where A is a (computable) Borel function. 



2.2 Stochastic gradient and its adaptive companion procedure: a first naive 
approach 

The above representation ^ naturally yields a stochastic gradient procedure derived from the 
convex Lyapunov function V which will (hopefully) converge toward ^* := VaRa{^{X)). Then, a 
recursive companion procedure based on ([2]) can be easily devised having C* := 'if-CYaRai'f{X)) as 
target. There is no reason to believe that this first version can do better than the empirical quantile 
estimate. But, it is a necessary phase in order to understand how our recursive IS algorithm (to 
be devised further on) can be combined with this first procedure. 
First we set 

dv 1 
Hiitx) := Q^itx) = 1 - Y3^1{¥'W>5}' (^) 

so that, 

V'{0=^[Hi{^,X)]. 

Since we are looking for ^ for which E [Hi{^, X)] = 0, we implement a stochastic gradient descent 
derived from the Lyapunov function V to approximate ^* := VaRa{^{X)), i.e., we use the RM 
algorithm: 

^n=^n-l-7n^l(^n-l,^n),n>l, ^0 ^ L\F), (6) 

where {Xn)n>i is an i.i.d. sequence of random variables with the same distribution as X, inde- 
pendent of ^0, with ]E[|^o|] < +00 and (7n)n>i is a deterministic step sequence (decreasing to 0) 
satisfying: 

^7n = +oo and ^7.^<-l-oo. (Al) 

n>l n>l 

In order to derive the a.s. convergence of ([6]) we introduce the following additional assumption on 
the distributions of ip{X) and ^{ip{X)). Let a > 0, 

(p{X) has a continuous distribution function and ^{(p{X)) € L'^°(P). {A2)a 

Actually, Equation ([6]) can be seen either as a regular RM procedure with mean function V' since 
it is increasing (see e.g. [10] p. 50 and p. 66) or as a recursive gradient descent procedure derived 
from the Lyapunov function V. Both settings yield the a.s. convergence toward its target To 
establish the a.s. convergence of (^n)n>i (and of our different RM algorithms), we will rely on the 
following theorem. For a proof of this slight extension of Robbins-Monro Theorem and of the a.s. 
convergence of (^n)n>i (under assumptions (Al) and {A2)i), we refer to [13]. 

Theorem 2.2. (Robbins-Monro Theorem (variant)). Let : M*^ x R*^ — > M"^ 6e a Borel function 
and X be an W^-valued random vector such that E[|i/(z, X)|] < oo for every z G W^. Then set 

VzeR'', h{z) =W.[H{z,X)]. 
Suppose that the function h is continuous and that T* := {h = 0} satisfies 

G M'^ \T*,Vz* G T*, (z- z*,/i(z)) > 0. (7) 
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Let (7n)n>i be a deterministic step sequence satisfying condition (Al). Suppose that 

yz e M^, E[\H{z, X)\^] < C(l + |z|2) (8) 
(which implies that \h{z)\ < C'{1 + \z\)). 

Let {Xn)n>i be an i.i.d. sequence of random vectors having the distribution of X, let zq be a random 
vector independent of {Xn)n>i satisfying lK[\zo\] < oo, all defined on the same probability space 
{0,,A,¥). Let Tn '■= (^{zq, Xi, Xn) and let {rn)n>i be an Tn-i^^o-surable remainder sequence 
satisfying 

^7nkn|^ < OO. (9) 

n 

Then, the recursive procedure defined for n > 1 by 

Zn = Zn-l - JnH{Zn-l,Xn) + Tn^^, 

satisfies: 

3 Zoo, such that Z„ Zoo and z^o G T* a.s. 

The convergence also holds in L^(P),p £ (0,2), where L^(P) denotes the set of all random vectors 
defined on {n,A,F) such that E[\X\P]p < oo. 

Remark: It is in fact a slight variant (see e.g. |13| ) of the regular RM Theorem since Z„ converges 
to a random vector having its value in the set {h = 0} even if {h = 0} is not reduced to a singleton 
or a finite set. The remainder sequence in the above theorem plays a crucial role when we will 
(slightly) modify the first IS procedure to improve its efficiency. 

The second step concerns procedure for the numerical computation of the ^'-CVaR^. A naive 
idea is to compute the function at the point 

^'-CVaRa = V^iC) = E[w{C,X)] 
using a regular Monte Carlo simulation, 

^ n— 1 

-Y,He,Xk+i). (10) 

k=0 

However, we first need to get from ([6]) a good approximate of ^* and subsequently to use another 
sample of the distribution X. A natural idea is to devise an adaptive companion procedure of the 
above quantile search algorithm by replacing ^* in ([TO]) by its approximation at step k, namely 

^ n— 1 

Cn = -y^w{^k,Xk+i), n>l, Co = 0. (11) 

k=0 

Hence, {Cn)n>o is the sequence of empirical means of the non i.i.d. sequence (tt^(Cfc, -'^fc+i))fc>i, 
which can be written recursively: 

Cn = Cn^l H2 {(,n-l,Cn-l, Xn) , n > 1, (12) 

n 

where H2 c, x) := c — w{^, x). 

At this stage, we are facing two procedures {(,n,Cn) with different steps. This may appear not 
very consistent or at least natural. A second modification to the original Monte Carlo procedure (I12p 
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consists in considering a general step /3„ satisfying condition (Al) instead of ^ (with in mind the 
possibihty to set /?„ = eventuahy). This leads to: 

Cn = Cn-l - PnH2 (Cn-1, Cn-l,Xn) , n > 1. (13) 

In order to prove the a.s. convergence of (C„)„>i toward C*, we set for convenience /3o := 
sup„>i Pn + 1- Then, one defines recursively a sequence {An)n>i by 

A„+i = A„— — , n > 0, Ao = 1. 

Pn PO — Pn+1 

Elementary computations show by induction that 

A " 

Pn = Po^, n > 0, with Sn = Y^ Afc. (14) 



log(5„) - log(5„_i) = - log ( 1 - ^"j > ^" 



Furthermore, it follows from (|14p that for every n > 1 
Consequently, 

1 " 

log(50>^5^/?fc 

which implies that lim„ Sn = +00. 

Now using (jl3p and (jl4p . one gets for every n > 1 

'S'nC'n = S'„_iC„_i + A„ {ANn+1 + ^'^(Cn)) 

where, AA^„ := u;(^„_i,X„) — T/^(^„_i), n > 1, define a martingale increments sequence with 
respect to the natural filtration of the algorithm Tn ■= cr{S,o, Xi, - ■ ■ , Xn), n > 0. Consequently, 



^ /n— 1 n — 1 

" \fc=0 A:=0 



The second term in the right hand side of the above equality converges to V^{^*) = ^-CYaRa{(p{X)) 
owing to the continuity of at ^* and Cesaro's Lemma. 

The convergence to of the first term will follow from the a.s. convergence of the series 

n 

k=i 

by the Kronecker Lemma since /?„ = Po^. The sequence {Nn)n>i is an .F„-martingale since the 
AA'^fc's are martingale increments and 



E [(ATVn) Vn-i] < J^^^^ (^)) - ^ (0) 



The continuity of ^ at assumption {A2)i and the a.s. convergence of toward ^* imply that 

supE[(AiV;„)^|j:„_i] < 00 a.s. 

n>l 
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Consequently, assumption (Al) implies 



(iV^)oo = ^/3^E[(AiV„)2|^„„i] < oo 

n>l 

which in term yields the a.s. convergence of {Nn)n>i, so that Cn "^-CVaRa {(p{X)) . 
The resulting algorithm reads as for n > 1: 

f Cn = Cn-l-lnHiiCn-l,Xn), £ L\F), 

\ (15) 
{ Cn = Cn-1 — l3nH2 (C„,-l , C„„i , X„) , Cq = 0, 

and converges under {Al) and {A2)\. 

The question of the joint weak convergence rate of (^n, Cn) is not trivial owing to the coupling 
of the two procedures. The case of two different step scales refers to the general framework of two- 
time-scale stochastic approximation algorithms. Several results have been established by Borkar 
in [5], Konda and Tsitsiklis in [21] but the more relevant in our case are those of Mokkadem and 
Pelletier in [30]. The weak convergence rate of (^n)n>i is ruled by the CLT for "regular" (single- 
time scale) stochastic approximation algorithms (we refer to Kushner and Clark in [23], Metivier 
and Priouret in [4], Duflo in [lOj among others). In order to achieve the best asymptotic rate 
of convergence, one ought to set 7n = -7^ where the choice of 70 depends on the value of the 
density fip[x) of '■fiX) at which is unknown. To circumvent the difficulties induced by the 
specification of 70, which are classical in this field, we are led to modify again our algorithm by 
introducing the averaging principle independently introduced by Ruppert [35] and Polyak [19] and 
then widely investigated by several authors. It works both with two-time or single-time scale steps 
and leads to asymptotically efficient procedures, i.e., satisfying a CLT at the optimal rate ^/n and 
minimal variance (see also [30]). See also a variant based on a gliding window developed in |26| . 
Our numerical examples indicate that the averaged one-time-scale procedure provides less variance 
during the first iterations than the averaged procedure of the two-time-scale algorithm. Finally, we 
set 7„ = (3n in so that, the VaR-CVaR algorithm can be written in a more synthetic way by 
setting Zn = (Cn, Cn) and for n > 1: 

Zn = Zn-l - JnH{Zn-l,Xn), Zq = (^0,Co) , ^0 G L\F), (16) 

where H{z,x) := {Hi{S,,x), H2{^,C,x)). Throughout the rest of this section, we assume that 
the distribution (p{X) has a positive probability density f(p{x) on its support. As a consequence 
the VaRa{'^{X)) is unique so that the procedure algorithm Z„ converges a.s. to its single target 
(VaRai^iX)), "^-CVaRai^iX))) . Thus, the Cesaro mean of the procedure 

- Zo H h Zn~i 

Zn := , n > 1, 

n 

where Zn is defined by (fT6|) . converges a.s. to the same target. The Ruppert and Polyak's Averaging 
Principle says that an appropriate choice of the step yields for free the smallest possible asymptotic 
variance. We recall below this result (following a version established in [10], see [10] (p. 169) for a 
proof). 

Theorem 2.3. (Ruppert and Polyak's Averaging Principle) Suppose that the W^-sequence {Zn)n>o 
is defined recursively by 

Zn = Zn-l - 7n {h{Zn-l) + Cn + r„) 

where h is a Borel function. Suppose that z* is a zero of h satisfying 

h{z)=M{z-z*) + 0{\z-z*\^) (17) 
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where M = Dh{z*) is a uniformly repulsive matrix (all its eigenvalues have positive real parts), and 
(en)n.>i is a random sequence satisfying 



(i) E[e„,+i|:r„]l|||2^_^*||<(^| = 0, 

(a) 3b > 2, sup„E[||e„+i||''|JP"„] l|||z„-z*||<c| < +oo, 



3 C > 0, such that a.s. < 



(m) E (7n-i) ^ k„|^ l{||Z„-z*||<C} 



0, 



(18) 



3T £ S+{d,R) such that E [e„,+ie^+i|J^„] ^ T. 



S'ei 7n = ^ TO^/i i < a < 1, and 



+1 



Zq + ■■■ + Zn 

n + l 

Then, on the set of convergence {Zn z*}: 



Zn 



n + l 



{Zn - Zn), n>0. 



{Zn -z*) (0, M~^T{M-^f) asn^ +oo, 

where {M~^)'^ denotes the transpose of the matrix . 

To apply this theorem to our framework we are led to compute the Cesaro means of both compo- 
nents, namely for n > 1 



in '•— n Sfc=l ik — in-1 n Hn-l Cn)i 
7^ 1 v~^n 1 (Y^ ri \ 



(19) 



where {(,k, Ck)-, A; > is defined by (jl6p . In the following theorem, we provide the convergence rate 
of the couple Z„ := (^„,C„). 

Theorem 2.4. (Convergence rate of the VaR-CVaR procedure). Suppose {A'2)a holds for some a > 
1, is twice differentiahle at ^* and the density function f,p(x) of f{X) is continuous, difjerentiable 
and strictly positive at ^* . If the step sequence is jn = ^ with ^ < a < 1 and 71 > then 



V^{Zn- z*) -^M{0,T.) as +00 
where the asymptotic covariance matrix S is given by 

^^^^ - ^(D) iMx)>e}]^ 



[(^(V^W) - ^(D) lMX)>e}] (I^Var {{^{^{X)) - *(e)) lMX)>e}) 



Proof. First, the procedure (I16p can be written as for n > 1 

Zn = Zn^l - In {h{Zn^l) + £„,) , = (^0, Cq) , £ L\F), 



(20) 



(21) 



where h{z) := E[H{z,X)] = i^l - ^l-F {^{X) > ,C -E[w{^,X)]j and e„ := (AM„,A7V,), 
n > 1, denotes the J>i-adapted martingale increment sequence with 
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Owing to Assumption (^2)^, the differentiability of ^ at and Lebesgue's differentiation Theorem, 
one can interchange expectation and derivation, so that the function h is differentiable at z* = 
{C,C*) and 



h'{z*) =M :-- 



Now, since ^ is differentiable at E 



E 



(22) 



1 - j^F{ip{x) > D ) ^'(D = 



so that, M = is diagonal. Since ^ is twice differentiable at ^* and f^ix) is 

V ly 

differentiable at D'^h{^*) is bounded and (fT7j) is satisfied. 

To apply Theorem 2.3, we need to check assumptions {i)-{iv) of (fT8|) . 

Let A > 0. First note that 



-z*\<A} 



< 



1 



1 — a 



2a 



2^" < +00. 



Thanks to Assumption {A2)a, there exists Ca,^ > such that 

E [AN^l^lJ^n] l{\\z,.-z'\\<A} < Ca,^ (1 + "^iCnf") 1{||Z.-.1|<A} < C«,^ ( 1 + sup l^iOl^A < 

V i5-ri<^ / 

Consequently, (ii) of (jlSp holds true with 6 = 2a > 2 since 



supE [|e„+ip''|JP"„J l{|z„-^*|<A} < 

n>0 

It remains to check (iv) for some positive definite symmetric matrix T. The dominated convergence 
theorem implies that 



E 



E 



1 



1 — a 

a 



^ [ ^Mxm}] - K [ i{^(x)>^}] ^5=^^ 



1-a 



E 



'e[(^(^(X))-M'(0) lMX)>d|e=c„ 



1 



1 — a 



a 



(1-a) 



1 [ 1mx)>5}]|^=5^ 

2E[(vi/Mx))-vi/(r)) iMx)>e}]> 
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E 



{^n+l^n+l) 2,2 1"^ ^ 



E 



1 



-E [(^(^(X)) - ^(0) lMx)>a]|e=^„ 



(1-a)^ 



E 



-E[(^(^(x))-^(r)) 1 



{v{x)>e}J 



(1 - a)- 



rVar((vI/MX))-M/(r)) 1 



{vW>C*} 



)• 



Using the continuity of both functions ^ i-^ E — ^'(C)) l{(/3(x)>g}] and 



E 



^'((^(X)) — ^(^)) l|^(x)>5} a,t which follows from the continuity of ^ and of the 



distribution function of (/^(X), finally yields the a.s. convergence of E [e„+ie^_,_]^|^„] toward 



-^E - ^(D) i{^(x)>e}] 



^ 1^(I^IE [(^M^)) - ^(D) lM.Y)>e}] (^Var((M/MX))-M/(r)) lMX)>e})^ 
^ with 7i > and ^ < a < 1, Ruppert-Polyak's Theorem implies that 



If 7. 



[Zn - z* 



c 



AA(0,S) 



□ 



where S = M -"^T (Af -'^)^ is given by (j20p . This completes the proof. 

Remarks: • It is possible to replace w{^,x) in ([13]) and ([T5]) by x) = y^^'((/3(x))1|^(^)>^}. 
since C* = E [tD (^*, X)]. Thus, we only have to change also the martingale increment sequence 



(A7V„)„>i by AiV„ defined by 

V / n>l 



1 — a 



E [^(^(X))l|^(^)>5|]|^^^^_^ - ^(v.(X„))l|^(^„)>5„_,|j . 



This provides another procedure Cn for the computation of the ^'-CVaRo which satisfies a Gaussian 
CLT with the same asymptotic covariance matrix. 

• The quantile estimate based on the inversion of the empirical distribution function satisfies a 
Gaussian CLT with the same asymptotic covariance matrix than the one of the procedure see for 
example [36] p. 75. Obviously, there is no reason to believe that this first version can do better than 
the empirical quantile estimate. However, our quantile estimate has the advantage to be recursive: 
it naturally combines with a recursive IS algorithm in an adaptive way. In terms of computational 
complexity, once N loss samples have been generated, the behaviour of the inversion of the empirical 
distribution function method needs a sorting algorithm: good behaviour is 0(A^log(A^)) element 
comparisons to sort the list of loss samples. Whereas the behaviour of the recursive quantile 
algorithm is O (N). 

• One shows that if we choose I3n = n > \ and 7„ = with ^ < a < 1 in ([T5]) , the resulting 
two-time scale procedure satisfies a Gaussian CLT with the same asymptotic covariance matrix T 
(at rates yf^n and y/n). However, by averaging the first component the resulting procedure 
becomes asymptotically efficient (i.e. rate ^/n). 
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Proposition 2.5. (Estimation of variance and confidence interval) For every n > 1, set 



(1 — a)^ I n 



k=l 



where (Cn)n>o ^/^e /irsi component o/ ([6]). // (A2)a is satisfied for some a> 2, then 

^ (13^ ((^(^(^)) - ^(D) l^(X)>e) 

and 



Proof. The proof follows from standard arguments already used in the proof of the a.s. convergence 
of the sequence (C„)n>i defined by (fT3]) . □ 

In practice, the convergence of the algorithm will be chaotic. The bottleneck of this algorithm 
is that it is only updated on rare events since it tries to measure the tail distribution of (/^(X) : 
¥{ip{X) > VaRa) = 1 — a ~ 0. Another problem may be the simulation of ^p{X). In practice, we 
have to deal with large portfolios of complex derivative securities and options. Each evaluation may 
require a lot of computational efforts and takes a long time. So, for practical implementation it is 
necessary to combine the above procedure with variance reduction techniques to achieve accurate 
results at a reasonable cost. The most appropriate technique when dealing with rare events is IS. 



2.3 Some background on IS using stochastic approximation algorithm 



The second tool we want to introduce in this paper is a recursive IS procedure which increases 
the probability of simulations for which ip{X) exceeds ^. Our goal is to combine it adaptively 
with our first naive algorithm. Assume that X has an absolutely continuous distribution ¥x{dx) = 
p{x)Xd{dx) where denotes the Lebesgue measure on (R'^, Bor{W^)). The main idea of importance 
sampling by translation applied to the computation of 

E[F{X)], 

where F £ L'^{Fx) satisfies F{F{X) / 0) > 0, is to use the invariance of the Lebesgue measure by 
translation, for every 9 G M*^, 



E[F{X)] = E 



F{X + 9) 



p{X + 6) 



p{X) 



(24) 



and among all these random vectors with the same expectation, we want to select the one with the 
lowest variance, i.e. the one with lowest quadratic norm 



Q{9) :=E 
If the following assumption 



/{x + e) 

p\X) 



< +00, 



(25) 
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holds true, then Q is everywhere finite and a reverse change of variable shows that: 



Q{e) = E 



F2(X) 



p{X-9)_ 



(26) 



Now if p satisfies 



{i) Vx G M'^, 6 H-> p(^x — 6) is log-concave 

{ii) Vx G lim|e|^+^p(x - 0) = or Vx G lim|e|^+oo = 



(B2) 



one shows that Q is (strictly) finite, convex, goes to infinity at infinity so that argminQ = 
{VQ = 0} is non empty (see [1] and [27j). Provided that VQ admits a representation as an 
expectation, then it is possible to devise a recursive RM procedure to approximate the optimal 
parameter 0* . Recursive IS by stochastic approximation has been first investigated by Kushner 
and then by several authors, see e.g. [Tl] and [H] in order to "optimize" or "improve" the change 
of measure in IS using a stochastic gradient RM algorithm based on the representation of 'VQ{9). 
Recently, it has been brought back to light by Arouna (see [I]) in the Gaussian case, based on the 
natural representation of VQ obtained by formally differentiating (j26p . Since we have no knowledge 
about the regularity of F and do not wish to have any, we differentiate the second representation 
of Q in ([26]) and not 1^. We obtain VQ{e) = E [Kie,X)]. 

When X =7^(0, 1), Q{e) = e^K[F^{X)e~^^] so that K{e,x) = e^F2(x)e-^^(6l - x). How- 
ever, given this resulting form of K, the classical convergence results do not apply since | |i^(^, ^)| b 
is not sub-linear in 6 (see condition ^ of Theorem 2.2). This induces the explosion of the pro- 
cedure at almost every implementation as pointed out in [1]. This leads the author to introduce 
a "constrained" variant of the regular procedure based on repeated reinitializations known as the 
projection "a la Chen" . It forces the stability of the algorithm and prevents explosion. Let us also 
mention a first alternative approach investigated in [l] and [3], where Arouna and Bardou change 
the function to be minimized by introducing an entropy based criterion. Although it is only an 
approximation, it turns out to be often close to the original method. 

Recently, Lemaire and Pages in [27] revisited the original approach and provided a new repre- 
sentation of VQ{6) for which the resulting K{0, X) has a linear growth in 6 so that all assumptions 
of Theorem 2.2 are satisfied. Thanks to a third translation of the variable 9, it is possible to plug 
back the parameter 9 "into" F, the function F having in common applications a known behaviour 
at infinity which makes possible to devise a "regular" and "unconstrained" stochastic algorithm. 
We will rely partially on this approach to devise our final procedure to compute both VaR and 
CVaR. To be more specific about the methodology proposed in [27], we introduce the following 
assumption on the probability density p oi X 



3b £ [1,2] such that < 
and introduce the assumption on F : 



(0 



0(\x[ 



as \x\ — > oo 



(B3) 



(a) 3p > O,log {p{x)) + p\x\'' is convex. 



VA > 0, E 



F{X) 



2^X1" 



< +00. 



(B4) 
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One shows that as soon as (Bl), (B2), (B3) and (B4) are satisfied, Qi and Q2 are both finite and 
diff^erentiable on M'^ with a gradient given by 



VQ{e) := E 



F{x - ef 



Vp{X - 20) 



p{X)p{X-2e) p{X-20) 



w{e,x) 



(27) 



This expression may look comphcated at first glance but in fact the weight term W{6, X) can be 
easily controlled by a deterministic function of 9 since 

\W{e,X)\ < e'^P^^^' {A\x\^-'^ + A\e\^-'^ + B) (28) 

for some real constants A and B. In the case of a normal distribution X = M{0; 1), 

W{e,X) = e^\29-X). 
So, if we have a control on the growth of the function F, typically for some positive constant c 

' Vx G M'^, |F(x)| < G{x) and G{x + y) < C{1 + G{x)Y{l + G{y)y 

E[|X|2(^-i)G(X)4^] < +00, 



(B5) 



then by setting 



we can define K by 



W{e,X) :-- 



-2p\e\^ 



l + G(-^)2^ 



wie,x), 



(29) 
(30) 



K{9,x) := F{x -efW{e,X) 
so that it satisfies the linear growth assumption ([8]) of Theorem 2.2 and 

G M'^ I E [K{e, X)] = o| = G M'' I VQ{e) = o| . 

Moreover, since Q is convex VQ satisfies d?]). Now we are in position to derive a recursive uncon- 
strained RM algorithm 

On = On-l - JnK{Bn-l,Xn), ^0 £ (31) 

that a.s. converges to an arg min Q- valued (square integrable) random variable 9*. 



3 Design of a faster procedure: importance sampling and moving 
confidence level 

3.1 Unconstrained adaptive importance sampling device 

We noted previously that the bottleneck in using the above algorithm lies in its very slow and 
chaotic convergence owing to the fact that ¥{(p{X) > ,^*) = 1 — a is close to 0. This means that 
we observe fewer and fewer simulations for which f^Xi^) > £,k-i as the algorithm evolves. Thus, it 
becomes more and more difficult to compute efficiently some estimates of VaRa and CVaRo when 
a ~ 1. Moreover, in the bank and energy sectors, practitioners usually deal with huge portfolio 
made of hundreds or thousands of risk factors and options. The evaluation step of f{X) may be 
extremely time consuming. Consequently, to achieve accurate estimates of both VaRc and CVaRo, 
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with reasonable computational effort, the above algorithm (|16p drastically needs to be speeded up 
by an IS procedure to "recenter" the simulations where "things do happen", i.e. which generates 
scenarios for which (p{X) exceeds ^. 

In this section we will focus on IS by mean translation. Our aim is to combine adaptively the IS 
(unconstrained) recursive procedure investigated in [27] with our first "naive" approach described 
in p^ . Doing so every new sample is used to both optimize the IS change of measure and update 
VaR and CVaR procedures. We plan to minimize the asymptotic variance of both components of 
the algorithm (in its "averaged" form, as detailed in Theorem 2.4), namely 



a 



I - a) _ Var(l^(x)>e) 



and. 



Var((^((^(X))-^(e))l^(X)>e) 



(l-a)2 

provided the non-degeneracy assumption 



for the VaRo, 



for the CVaRn 



Veeargminy, F U^{^{X)) - ^{OY 1mx)>^} > 0} >0 



(32) 



(33) 



(AS) 



holds. Since the density /<^(x)(^*) is an intrinsic constant (and comes in fact from the Jacobian 
matrix Dh{^*,C*) of the mean function h of the algorithm) we are led to apply the IS paradigm 
described in Section 2.3 to 

Fl{X) = l^(^x)>i* and F*{X) = {^{^{X)) - ^>{C)) l{^(x)>r}- 

Let us temporary forget that of course we do not know ^* at this stage. Those two functionals are 
related to the minimization of the two convex functions 



Q2(/u,r) 



E 
E 



4v(^)>e} 



p{X-9) 



{^{^{x))-^{e)fi{^^xm,y^ 



p{X) 



'p{X-fi) 

We can apply to these functions the minimizing procedure ([3T]) described at section 2.3. Since 



(34) 
(35) 



I — a 

it is clear, owing to (p^ that 



Fi* (x) and H2 {C ,C*,x) = C* - ^{C 



1 - Q 



(36) 



E[HiiC,X)]=K 



H,{C,x + e) 



Pix + 



p{x) 



1, 2. 



Now, since we do not know either ^* and C* (the VaRc and the CVaRo-) respectively we make 
the whole procedure adaptive by replacing at step n, these unknown parameters by their running 
approximation at step n — 1. This finally justifies to introduce the following global procedure. One 
defines the state variable, for n > 0, 



Zr) 



where ^n, Cn denotes the VaRa and the CVaRo, approximate. On-, /^n denotes the variance reducers 
for the VaR and the CVaR procedures. We update this state variable recursively by 



Zn — Zn-l — 7n-^ {Zn-li Xn) , 



(37) 
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where (-'^n)„>i is an i.i.d. sequence with distributions X (and probability density p) and 



1 ^ p{x + 9) 



L4 



p{x) 



p{x + 



-2p|e|' 



1 — a 



-M'^-(^)^i}p{x)p{x-2e) p{x-2e) ' 



(38) 



-2p|/.|f 



l + G{-^lY^ + ^{i)' 



{■^{^{x-p.))-^{i)f 



xl 



p^(2; — Vp(x — 2;u) 
M^-M)>C}p(^)p(^_2^) p{x-2n) ■ 



(39) 



The following proposition establishes the a.s. convergence of the procedure. For the sake of sim- 
plicity we will assume the uniqueness of the VaR^ of (p{X). 

Proposition 3.1. (Efficient computation of VaR and CVaR). Suppose that '^{(p{X)) £ L? (P), that 
the distribution function of ip{X) is continuous and increasing (so that VaRa{'p{X)) is unique) and 
that (A3) holds. Assume that, for every ^ G M, Qi{.,^) (i=l,2) satisfies (Bl), i.e. 



ye G w, E 



(i + (*((^(x))-M/(0)' 



P{X) 

p{x-e) 



< +00. 



(40) 



Suppose that p satisfies (B2) and (B3) and that 



y{^{X)f + l)e 



A\X\^ 



< +00. 



Vyl > 0, E 

Assume that the step sequence (7n)n>i satisfies (Al). Then, 

Zn^z* := (r,C*,C,/x;) 

where ^* = VaRa{^{X)), C* = ^-CVaRa{(p{X)) and {0^, fJ-a) 0,''"^ the optimal variance reduc- 
ers (to be precise some random vectors taking values in {VQi(^*,.) =0} and {VQ2('^*!-) = 0} 
respectively). 

Proof. We first prove the a.s. convergence of the 3-tuple {Cn,dn, fJ-n) that of (C„)„>i will follow 
by the same arguments used in the proof in Section 2.2. The mean function / is defined by 



p.) := 1 



1 — a 



'{ip{X)>0,e-^P\'\\Qi {9,0 



rVQ2(A^,0 



hence, 



T* = {i = o} = {c} X {vQi(r , .) = 0} X {vQ2(r , = 0}. 

In order to apply the extended Robbins-Monro Theorem, we have to check the following facts: 
• Mean reversion: One checks that VC = (C, 6*, A*) G K x M'^ x M'^ \T*, VC* G T* , 



{(-cm) = (e-D ^^^^ 7-^^ ^ +-, {9-9l,VQ^i9,0) 

1 — a 1 — a 

g-splMl" 

+ (l-a)(l + F(-/.)2^) - > 0' 
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owing to the convexity oi 9 Qi{(^,i) and ^ ^ Q2(^)0i fo'^ every ^ G R. 
• Linear growth: Let us first deal with Li. First note that: 



E 



< C 1 + E 



-apiei" 



< C 1 +E 



-2p\e\^ 



P{X) 



p{X 



Now, elementary computations show (see [27j for more details) that (B3)(ii) implies that 



so that 



E 



p{x — 9) 
-2p\e\b pjX) 



p{X 



< E 



p{x + 9), 

p{X + 9) 
P{X) 



1. 



L3 and L4 can be treated by a straightforward adaptation of the proofs in [27]. Then, one can 
apply Theorem 2.2 which yields the announced result for {^n,0n, fJ'n)- The a.s. convergence of C„ 
toward C* can be deduced from the a.s. convergence of the series 



■.= Y,7k^Mk, n> 1, 



k=l 

where AMn are martingale increments defined by 
AM, = E[(^((^(X))-^(e)) l{^(x)>c}]|5=c„_, 



-'-{<^(X„+Atn-l)>5n-l} 



p{Xn) 



n > 1, 



satisfying 



E 



< E 



l«=Cn-i,e'= 



We conclude by the same arguments used in the proof in Section 2.2. 



□ 



Now, we are interested by the rate of convergence of the procedure. It shows that the algorithm 
behaves as expected under quite standard assumptions: it satisfies a Gaussian CLT with optimal 
rate and minimal variances. 

Theorem 3.2. Suppose the assumptions of Proposition 3.1 hold true. Assume that ^{ip{X)) G 
L^"(P) for some a > 1 and that the step sequence is jn = with ^ < p < 1 and 71 > 0. Suppose 
furthermore that ^ is twice differentiable at ^* and that the density f^(x) '^^ differentiahle and 
strictly positive on its support. Let (^„,C„)„>i he the sequence of Cesaro means defined by: 

- ^o + --- + ^n-l 7=7 Co + ... + C„_i 



n 



n 



This sequence satisfies the following CLT: 



■AA(0,S*) as +00, 



(41) 
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where 



l{^(x+es)>e} p^j^^ — J ' 

Proof. The proof is built like the one of Theorem 2.4. If we denote h the mean function of the 
global algorithm h{z) = E.[L{z, X)], the algorithm (j37p can be written as 

Zn = Zn-l - 7n (/i(^n-l) + ^n) , ^ > 1, Zq = (^0,0) , ^0 G L^F), (42) 

where the first two components of h are the same function as the ones in the proof of Theorem 2.4 
and (en)n>i denotes the .7>i-adapted martingale increments sequence where 

f~2,n := ^(E[(v&MX))-vI/(e)) lMX)>dl«=?n 



One can check easily that the sequence {en)n>i satisfies (i) — {iv) of (jlSp . □ 

Remarks: • There exists a CLT for the whole sequence {Zn)n>i and for its empirical mean 
{Zn)n>i according to Ruppert and Polyak averaging principle. We only stated the result for the 
two components of interest (the ones which converge to VaR and CVaR respectively) since we only 
need rough estimates for the other two (see below). 

• In the first Central Limit Theorem (Theorem 2.4) for quantile estimation, the factor a(l — a) is 
the variance of the indicator function of the event {^{X) > ^*}. With our recursive IS procedure, 
it is replaced by the variance of the shifted indicator function modified by the measure change: 
Var (^'^{ip{x+e*)>^*} ^^pQC)°'^ ^ ■ For further details on the rate of convergence of the unconstrained 
recursive importance sampling procedure, we refer to |27j . 

Now, let us point out an important issue. The algorithm (|37p raises an important problem 
numerically speaking. Actually, we have two algorithm ^„ and {On^f^n) that are in competitive 
conditions, i.e. on one hand, we added an IS procedure to (^n)n>i to improve the convergence 
toward and on the other hand, the adjustment of the parameters (^n,/^n) "need" some samples 
Xn+i satisfying ip{Xn+i — 9n) > (,n and </5(X„_|_i — //„) > ^„ (^f = Id) which tend to become rare 
events. Somehow, we postponed the problems resulting from rare events on the IS procedure itself 
which may "freeze". This in term suggests to break the link between the VaR-CVaR and the IS 
procedures by introducing a VaR companion procedure that will drive the IS parameters to the 
tail distribution. A solution to do this is to make the confidence level increase slowly from a lower 
value (say oq = 50%) up to the target level a. This kind of incremental threshold increase has 
been already proposed in |22j in a different framework. This idea is developed in the next section. 



19 



3.2 How to control the move towards the critical risk area: the final procedure 

From a theoretical point of view, so far, we considered the purely adaptive approach where we 
approximate (^*, C*, 0* , /i* ) using the same innovation sequences. From a numerical point of view, 
we only need a rough estimate of the optimal IS parameters So that we are led to break 

the algorithm into two phases. Firstly, we compute a rough estimate of the optimal IS parameters 
(Omj/J-m) with a small number of iterations M and in a second time, estimate the VaRo and the 
CVaRo, with those optimized parameters with N iterations (M ^ in practice). 

Now, in order to circumvent the problem induced by the IS procedure, we propose to introduce 
companion VaR procedure (without IS, i.e., based on Hi from Section 2.2) that will lead the 
IS parameters into the critical risk area during a first phase of the simulation, say the first M 
iterations. An idea to control the growth of 9n and /x^ at the beginning of the algorithm, since we 
have no idea on how to twist the distribution of (p{X), is to move slowly toward the target critical 
risk area (at level a) in which ^p{X) exceeds ^ by introducing a non-decreasing sequence On slowly 
converging to a during the first phase. Since the algorithm for the CVaR component Cn is free of 
a, by doing so, we only modify the VaR procedure (,n- The function Hi in ()16p is replaced by its 
counterpart which depends on the moving confidence level On, namely 

L = L- i-7nHi(in-i,Xn,an) , n > 1 , = ^0 G (F) • (43) 

where, 

VeeM, VxGM^VaG]0,l[, Hi (C, x, a) = 1 - lMx)>a- 
The sequence ( f n I is only designed to drive "smoothly" the IS procedures toward the "critical 

V /n>0 

area" at the beginning of the procedure, say during the first M iterations and in no case to 
approximate ^* or C* . To be more precise, we define recursively the variance reducer sequence 
{On)n>i, (An)n>i by plugging at each step n, ^n-i into L^^., 9n-i,Xn) and Li{., fin-i, Xn) as defined 
in Section 3.1. This reads as follows, for n > 1, 



9n = dn-1 — InLs iCn-l,9n-l,Xn] , Oq^^'^, (44) 



n-1 — InLi ( in-1, fl'n-l,Xn ) , /^O G I^"'- 



Although, we are not really interested in the asymptotic of this procedure (^n)) its theoretical 
convergence follows from Theorem 2.2: as a matter of fact if we define a remainder term r„, by: 

r-n ■= Hi (j,n-l,Xn, " Hi (^in-l,Xn^ , n > 1, 

the procedure defined by ()44p now reads 

ln=ln-l-7n,(^l(l„-l,^n)+r„,), n > 1, |o G ^'(F)- (45) 

One checks that 

\rn\ < 



(l-a)2' 

so that Assumption ([9]) of Theorem 2.2 is satisfied as soon as 

^7„(a - a„,)^ < +00. 

n>l 
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3.3 A final procedure for practical implementation 

In practice, we divided our procedure into two phases: 

> Phase I is devoted to the estimation of the variance reducers (Oajf^a) using (j44p . The moving 
confidence level a has been settled as follows {M ~ 15000) : 

a„ = 50% for 1 < n < Mi := M/3, a„ = 80% for Mi < n < 2Mi, a„ = a for 2Mi <n< M. 

\> Phase II produces some estimates for (^*,C*) based on the procedure defined by (j37|) and its 
Cesaro mean with N iterations. Note that during this phase, we keep on updating the IS parameters 
adaptively. 

Now, we can summarize the two phase of the final procedure by the following pseudo-code: 



Phase I: Estimation of (/i*,6l*). M <€. N (typically M ^ iV/100). 
for n = 1 to M do 



'n-1 — InLz I in-l,Gn-l,Xn I , 



5: /tn ^ — 7ri-^4 (^^n-l, /tn-l, 

6: end for 

7: Phase II: Estimation of (^*,C*). Set, for instance, = ihU Cq = 0, Oq = Om, and hq = fiM- 

8: for n = 1 to do 

9: ^ in-l — InLl {^n~l, (^n~l, Xn) 

10: Cn ^ Cn-1 — 7n-^2 , C^-i , /i„_i , X„) 

11: 9n On~l — ^nL'i {^,n~l,On~l, Xn) 
12: /X„ ^ /X„„i — 7„,L4 (^„_1, ^„_1, X„,) 

13: Compute the Cesaro means 

14: ^ ^ 

n n 

15: end for 

16: {CC*) is estimated by {Cn,Cn)- 

An alternative, especially as concerns practical implementation, is to replace to Phase II by 

Phase II' in which the variance reducers coming from Phase I are frozen at 6mi Am- 

The only updated sequence is (^„, C^), as follows 



Cn = Cn-1 - InLl [Cn-1, OM,Xnj , 
Cn = Cn-1 — ^nL2 {Cn-1, Cn-1, flM, Xn) ■ 



4 Towards some extensions 

4.1 Extension to exponential change of measure: the Esscher transform 

Considering an exponential change of measure (also called Esscher transform) instead of the mean 
translation is a rather natural idea that has already been investigated in |20j and [27] to extend 
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the constrained IS stochastic approximation algorithm with repeated projections introduced in pQ. 
We briefly introduce the framework and give the main results without any proofs (for more details, 
see [27j and [H]). Let i{j denote the cumulant generating function (or log-Laplace) of X i.e. the 
function defined by ip{9) := logE[e^^'"'''^]. We assume that V'(^) < +00, which implies that ip is an 
infinitely differentiable convex function and define 



Pe{x) 



x), X G 



We denote by X^^^ any random variable with distribution pQ. We make the following assumption 
on the function ip 



limip{9) - ( - ) = +00 and 36 > 0, 9^ ^{9) - 5\9\'^ is concave. 



The two functionals to be minimized are 

Qi{9,e) ■■= E 



1r .^ip-^^'-^^+'^W 



According to Proposition 3 in [27] as soon as "0 satisfies (H^^) and that, 

Ve G M, V0 G R'^, E[\X\ (1 + ^((^(X))2) e^^'^'^] < +00, 



(46) 
(47) 

(48) 



for every ^ G M, the functions Qi{.,0 ^^'^ Q2(-)0 ^.re finite, convex, differentiable on W^, go to 
infinity at infinity, so that argminQi(., ^) and argminQ2(-)0 a-re non empty. Moreover, their 
gradients are given by 



VeQi{9,0 = E 

E[Xe<«-^>l 



(VV(A^) - X(->^^){^{^{X^-^^)) - vI'(0)'l{^(^(-.,)>5} 



(49) 



with \/'tlj{9) = -^i^^^xyy^- Now, the main result of this section is the following theorem (for more 
details, we refer to [27] and [13]). 



Theorem 4.1. Suppose that ip satisfies (-ff|'^) and that {A2)i, (A3) hold. Assume that (jl8j) is 
fulfilled and that 

Vx G M^, |^'(v'(x))| < Ce^l^l and E[\X\^e^^^^] < +00. 
One considers the recursive procedure 

Zn = Zn^l - JnL{Zn^l,Xn), n > 1, Zq = (Co , C'o , ^0 , /^o) (51) 

where (7n)n>i satisfies the usual step assumption (Al), Zn ■= iCn,Cn,9n, fJ^n) ctnd each component 
of L is defined by 



Li ( Cn-l, 9n^i, X; 



'/'(e„_i)+i^(-e„-i) 



■.= C-w{Cn-l,fin-l,X^^-") 



1 -, i>i0r^-l)-(xi'''~^\e„-l) 

(A'7l-l)^ 



L3 [Cn-l, 9n-l, X; 



(-A»n-l) 



g-4^|V^(-M„-l) 



x(VV(/U„_i)-X("A^"-i)), 



22 



with w{tf^,oc) := ^(0 + —-{^{^(x)) - $(e))l|^(,.)>5| e'/'M-<A'.->. 

Then, converges a.s. toward z* := (,^*, C*, 0* , /i* ), where ^* is a square integrahle VaRa- 
valued random variable, C* = ^ -CV aRa{'^{X)) , 0* is a (square integrahle) aigm.m.Qi[., £^*)-valued 
random vector and /i* is a (square integrahle) arginmQ2{-,^*) -valued random vector. 

4.2 Extension to infinite dimensional setting 

In the above sections, we proposed our algorithm in a finite dimensional setting where the value 
of the loss L = (p{X) is a function of a random vector having values in M"'. This is due to the 
fact that generally the value of a portfolio may depend on a finite number of decisions taken in 
the past. Thus, the value of the loss at the horizon time T — t may depend on a large number of 
dates in the past to = ^ < < ^2, ■■■ < t]\f = T — t, with A'^ = 250 for a portfolio with time interval 
T — t = 1 year. For instance, if we consider a simple portfolio composed of short positions on 250 
calls with a maturity at each t^ and a strike K. The loss at time t^v = 1 year can be written: 

N 

L = Y, e^^'^-'^\St^ - K)+ - e"*^Co^ 
fc=i 

where Cq denotes the price of the call of maturity tj and strike A', with 

2 , 

Q _ C g(T--^){tfe+l-ife)+o-A/(tfc+l-tfc)^fe 

So that, X = Z = (Zi, Z250) is a Gaussian vector with d = 250. Consequently, with our 
above procedure, On and /i„ are two random vectors of dimension d and we have to control the 
growth of each component. If one grows too fast and take too high values, it may provides bad 
performance and bad estimates of both VaR and CVaR. To circumvent this problem, one can 
reduce the dimension of the problem by choosing the same shift parameters for several dates, i.e. 
for instance 

— \ y n 1 -J i '^n ' • ■ • ) "n i -j i "n J ■ 

10 times 10 times 

Now, we can run the IS algorithm for 9^,.. .,6'^^ so that, we have to deal with a procedure in 
dimension 25. It is sub-optimal with respect to the procedure in dimension 250 but it is more 
tractable. Another relevant example is a portfolio composed by only one barrier option, for instance 
a Down &: In Call option 

^{X) = (At - A:) + 1 {,r,in^o<t<T}X,<L} 

where the underlying X is a process solution of the path-dependent SDE 

dXt = b{Xt) dt + cj{Xt) dWt, Xo = x£ (52) 

W = {Wt)te[o,T] being a standard Brownian motion. A naive approach is to discretize (j52]) by an 
Euler-Maruyama scheme X = (A"t^,)fcg|o^,.,^„} 

Xt,+, = Xt, + b{Xt,){tk+i - tk) + a{Xt,){Wt,_^, - Wt,), Xo = xo€ R. 

This kind of approximation is known to be poor for this kind of options. In this case, our IS 
parameters 6 and fi are n-dimensional vectors which correspond to the number of steps in the 
Euler scheme. Now, if you consider a portfolio composed by several barrier options with different 
underlyings, the dimension can increase greatly and becomes an important issue, so that our first 
IS procedure is no longer acceptable and tractable. To overcome this problem, the idea is to shift 
the entire distribution of X in (j52p thanks to a Girsanov transformation. This last case is analyzed 
and investigated in [27]. It can be adapted to our framework (see [13] for further developments). 
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5 Numerical examples 



For the sake of simpUcity, we focus in this section on the finite dimensional setting and on the 
computation of the regular CVaRo = Id). We first consider the usual Gaussian framework in 
which the exponential change of measure coincide with the mean translation change of measure. 
Then we illustrate the algorithm (jSip in a simple case. 

5.1 Gaussian framework 

In this setting, X ~ AA(0, Id) and p is given by 

= (27r)^ie~^, xGM'^, 
so that (B3) and (B4) are satisfied with p = ^ and 6 = 2. In this setting, we already noticed that 
L3{^,d,x) := l|^(^Y-e)>c}(26' - x), 

L4{C,P,x) := ---^J-^_-^(^(X-/x)-e)^(2/i-x). 

Moreover, we use a stepwise constant sequence an that slowly converges toward a as proposed in 
Section 3.3. We consider three different portfolios of options (puts and calls) on 1 and 5 under- 
lying assets (except for the last case). In the third case, we study the behaviour of a portfolio 
composed by a power plant that produces electricity from gas with short positions in calls on elec- 
tricity. The assets are modeled as geometric Brownian motions for the first two examples. In the 
third example, the assets (electricity and gas day-ahead prices) are modeled as exponentials of an 
Ornstein-Uhlenbeck process. This last derivative is priced using an approximation of Margrabe 
formulae (see e.g. [29]). We assume an annual risk free interest rate of 5%. In each example, we 
use three different values of the confidence level a = 95%, 99%, 99.5%, which are specified in the 
Tables. We use the following test portfolios: 

Example 1. Short position in one put with strike i^T = 110 and maturity T = 1 year on a stock 
with initial value Sq = 100 and volatility a = 20%. The loss is given by 

MX) := {K - St)+ - e'-^ Po 

with 

St := 5oe((^-^)^+'^^^) 

where X ~ AA(0, 1) and Pq is the initial price at which the put option was sold (it is approximately 
equal to 10.7). The dimension d of the structural vector X is equal to 1. The numerical results are 
reported in Table 1. 

Example 2. Short positions in 10 calls and 10 puts on each of the five underlying assets, all 
options having the same maturity 0.25 year. The strikes are set to 130 for calls, to 110 for puts 
and the initial spot prices to 120. The underlying assets have a volatility of 20% and are assumed 
to be uncorrelated. The dimension d of the structural vector X is equal to 5. The numerical results 
are reported in Table 2. 

Example 3. Short position in a power plant that produces electricity day by day with a maturity 
of T = 1 month and 30 long positions in calls on electricity day-ahead price with the same strike 
K = 60. Electricity and gas initial spot prices are Sq = 40 $/MWh and Sq = 3 $/MMBTU 
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(^BTU; British Thermal Unit) with a Heat Rate equals hji = 10 BTU/kWh and generation costs 
C = 5 $/MWh. The two spot prices have a correlation of 0.4- The payoff can he written 

30 
k=l 

where P§ is a proxy of the price of the option on the power plant and is equal to 149.9 and Cq 
is the price of the call options which is equal to 3.8. This is a sum of spark spread options where 
we decide to exchange gas and electricity each day during one month. The dimension d of the 
structural vector X is equal to 60. The numerical results are reported in Table 3. 

The results displayed in the following tables correspond to the VaR, the CVaR and the variance 
reduction ratios estimations for both VaR and CVaR procedure using a number of steps specified 
in the first column, still for the same three levels of a. The variance ratios correspond to the ratio 
of an estimation of the asymptotic variance using the averaging procedure of (|16p divided by an 
estimation of the asymptotic variance using the averaging procedure of (|37|l : VRvaR corresponds to 
the variance reduction ratio of the VaR estimate and VRcvaR corresponds to the variance reduction 
ratio of the CVaR estimate. The results emphasize that the IS procedure yields a very significant, 
sometimes huge variance reduction especially when a is closed to 1. 
In the three examples, we define the step sequence by 7^ = ^liliQQ where (3 = \. 



Table 1: Example 1 Results 



Number of steps 


a 


VaR 


CVaR 


VRvaR 


VRcVaR 


10 000 


95% 


24.6 


29.9 


5.5 


30.5 




99% 


34.4 


37.5 


11.1 


125.3 




99.5% 


37.8 


41.4 


13.4 


192.9 


100 000 


95% 


24.6 


30.4 


6.6 


32.2 




99% 


34.2 


37.9 


11.5 


127.9 




99.5% 


37.3 


40.7 


15.1 


185 


500 000 


95% 


24.6 


30.3 


7.7 


31.3 




99% 


34.2 


38 


14.6 


118.4 




99.5% 


37.3 


40.5 


15.5 


184 





Table 2: 


ExampL 


e 2 Results 




Number of steps a 


VaR 


CVaR 


VRvaR 


VRcVaR 


10 000 


95% 


339 


440.5 


6.5 


14.9 




99% 


493.1 


561.4 


10.1 


24.3 




99.5% 


540.1 


606.4 


18.2 


37.9 


100 000 


95% 


349.8 


439.7 


6.7 


17 




99% 


495.7 


563.8 


11.3 


28.6 




99.5% 


544.8 


607.8 


18.9 


40.3 


500 000 


95% 


352.4 


439.6 


6.8 


17.3 




99% 


495.2 


563 


11.1 


27.7 




99.5% 


545.3 


608.4 


19.2 


37 
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Table 3: 


ExampL 


e 3 Results 




Number of steps a 


VaR 


CVaR 


VRvaR 


VRcVaR 


10 000 


95% 


115.7 


150.5 


3.4 


6.8 




99% 


169.4 


196 


8.4 


12.9 




99.5% 


186.3 


zio.z 


io.o 


20.3 


100 000 


95% 


118. 1 


150.5 


4.5 


8. ( 




99% 


169.4 


195.4 


12.6 


17.5 




99.5% 


188.8 


212.9 


15.6 


29.5 


500 000 


95% 


119.2 


150.4 


5 


9.2 




99% 


169.8 


195.7 


13.1 


18.6 




99.5% 


188.7 


212.8 


17 


29 



5.2 Esscher transform: the NIG distribution 

Now, we consider a simple case of portfolio composed by a long position on a Call option with 
strike K = 0.6 and maturity T = 1 year, where the underlying is e^'^ {Xq = 0), where Xt is a 
Normal Inverse Gaussian (NIG) variable, Xt ~ NlG{a, (3,6, fi), a > 0, \(3\ < a, 5 > 0, fi £ W. Its 
density is given by 



where Ki is a modified Bessel function of the second kind and 7 = ya^ — . Note that the 
generating function of the NIG distribution is given by 

and is not well defined for every G R, so that we change the algorithm parametrization (see 
section 4.3 of [27|). The loss of the portfolio can be written L = ip4{Xt) = 50(e"'''^ — K)^ — e^'^Cq. 
Note that the price Co is computed by a crude Monte Carlo and is approximately equal to 42. The 
parameters of the NIG random variable Xt are a = 2.0, (3 = 0.2, 5 = 0.8, /x = 0.04. We want 
to compare the variance reduction achieved by the translation of the mean (see section 3.1) and 
the one achieved by the Esscher Transform (see section 4.1). In the Robbins- Monro procedure, we 
define the step sequence by 7„ = ^p^^^q where /3 = | . 

> Translation case. The functions L3 and L4 of the IS procedure are defined by: 



r(.ox\ - .-m, y(X - 29) f p{X - 6) \ 

m,9,X) .- e [p{X-29)) 



2 



^^^^'^'""^ •= i+G{-,)+e^^^''-^^-^^^ p{x) Irt^^J ' 

where p' is easily obtained using the relation on the modified Bessel function K[{x) = -Ki{x) — 
K2{x). 

\> Esscher Transform. In this approach, the functions L3 and L4 are defined by 
L3{^,9,X) := 1^(^(-.,)>^(VV^(^)-X(-^)), 

U^,^i,x) := ^(^(x(-'^))-o^(vv^(/x)-x(-'^)). 
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where ~ NIG(a, ^±6,6, fi). 

Table 4 compares the variance reduction ratios of the VaRc and CVaRo, algorithms achieved by 
the translation of the mean (VRy^^ij and ^^cVaR.) '-'^^ achieved by the Esscher Transform 

i^^'v'aR and VR^Vafi)- 

Table 4: Example 4 Results 



Number of steps 


a 


VaR 


CVaR 




VRcVaR 


VRvaR 


VI^CVaR 


10 000 


95% 


85.8 


215.7 


5 


10 


4.2 


58.8 




99% 


217 


518 


6 


12 


8 


60 




99.5% 


304 


748 


8 


25 


8.9 


110 


100 000 


95% 


87.2 


215.1 


5 


12 


4.5 


60 




99% 


218 


521 


5 


12 


8.2 


70 




99.5% 


303.5 


747.8 


7 


30 


12 


100 


500 000 


95% 


87.9 


215.6 


5 


9 


5 


57 




99% 


227 


518.9 


5.5 


11.8 


11.5 


68 




99.5% 


312.8 


741.8 


6 


31 


10 


123 



The IS procedure is very efficient when ¥({p{X) > = 1 — a is close to zero and becomes 
more and more efficient as a grows to 1. Even for the complex portfolio considered in Example 3, 
where X is a Gaussian vector with d = 60, it is possible to achieve a great variance reduction for 
both VaRc and CVaRa. 

We observed that IS based on Esscher transform is well adapted to distributions with heavy tails 
(i.e. heavier tails than the normal distribution). It is therefore suitable when large values are more 
frequent than for the normal distribution, as it is the case when the vector X is a NIG random 
variable. Indeed, in this setting, the IS parameters modify the parameter (3 which controls the 
asymmetric shape of the NIG distribution. We think that the IS procedure by Esscher transform 
outperforms the IS procedure by mean translation when the IS parameter impacts on the symmetry 
of the distribution. 

Concluding remarks. 

In this article, we propose a recursive procedure to compute efficiently the Value-at-Risk and 
the Conditional Value-at-Risk using the same innovation for both procedures. In our approach, 
for a given risk level a, the VaRo, and the CVaRo, are estimated simultaneously by a regular RM 
algorithm. Ruppert and Polyak's averaging principle provides an asymptotically efficient procedure. 
The estimates satisfy a Gaussian CLT. However, due to the slow convergence of the global procedure 
since we are interested in rare events, the regular version of this algorithm cannot be used in 
practice. To speed-up and thus greatly reduce the number of scenarios, we devise an unconstrained 
adaptive IS procedure. The resulting procedure provides estimates that satisfy a CLT with minimal 
variances. To optimize the move to the critical risk area, the risk level a can be temporarily replaced 
by a slowly increasing level a„ (stepwise constant in practice) converging to a. This produces a 
VaR companion procedure {(,n)n>i that controls the IS change of measure parameters {9n,fi'n)- 
Numerically speaking, the resulting procedure converges efficiently and can drastically reduce the 
variance. It is possible to extend the methods to portfolio whose losses depend on a general 
diffusion process, using Girsanov transform to introduce a potentially infinite dimensional variance 
reducer. Finally, we aim at extending the method by implementing low-discrepancy sequences in 
our procedure instead of pseudo-random numbers. Preliminary numerical experiments showed a 
significant improvement of the convergence rate . This also raises interesting theoretical problems. 
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See |25j for some first theoretical results in that direction in a one-dimensional framework and [13] 
for further developments in higher dimensional setting. 
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